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SUMMARY 


This  is  the  last  in  a series  of  technical  reports  concerned  with 
mathematical  approaches  to  Instructional  sequence  optimization  in 
instructional  systems.  The  problem  treated  here.  Is  very  closely  re-- 
lated  to  that  treated  by  Smallwood  and  Sondlk  (4) . Both  papers  deal 
with  Markov  decision  processes  where  the  true  state  of  the  system  Is 
not  known  with  certainty.  Hence  the  state  of  the  system  Is  characterized 
by  a probability  vector.  Each  action  yields  an  expected  reward,  trans- 
forms the  system  to  a new  state  and  yields  an  observable  outcome.  One 
wishes  to  determine  an  action  for  each  probability  state  vector  so  as 
to  maximize  the  total  expected  reward.  Smallwood  and  Sondlk  (4)  solve 
this  problem  exactly  for  a finite  time  horizon.  This  report  treats 
the  Infinite  time  horizon  with  a discount  factor,  using  a partial  N 
dimensional  Maclaurln  series  to  approximate  the  total  optimal  reward 
as  a function  of  the  probability  state  vector.  While  this  model  was 
developed  for  computed  aided  Instruction,  it  is  applicable  to  other 
situations  as  well.  This  model  also  is  of  considerable  theoretical 
va lue . 
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ABSTRACT 


This  paper  describes  a system  that  may  be  In  any  one  of  states 
The  true  state  of  the  system  Is  not  known  with  certainty 
and  consequently  Is  described  by  a probability  vector.  At  each  stage 
an  action  must  be  chosen  from  a finite  set.  Each  possible  action 
returns  an  expected  reward,  transforms  the  system  to  a new  state  In 
accordance  with  a Markov  transition  matrix,  and  yields  an  observable 
outcome.  It  Is  required  to  determine  an  action  for  each  possible 
state  vector  In  order  to  maximize  the  total  expected  reward  over  an 
infinite  time  horizon  under  a discount  factor,  6,  where  0<6<1. 

The  problem  of  finding  the  total  maximum  discounted  reward  as 
a function  of  the  probability  state  vector  may  be  formulated  as  a 
linear  program  with  an  Infinite  number  of  constraints.  The  reward 
function  may  be  expressed  as  an  N dimensional  Maclaurln  series  and 
In  this  paper  It  Is  approximated  by  a partial  series  consisting  of 
terms  up  to  degree  n.  The  coefficients  In  this  series  are  also 
determined  as  an  optimal  solution  to  a linear  program  with  an  Infinite 
number  of  constraints.  A sequence  of  related  finitely  constrained 
linear  programs  are  solved  which  generate  a sequence  of  solutions 
that  converge  to  a local  minimum  for  the  Infinitely  constrained  pro- 
gram. It  Is  an  open  question  as  to  whether  this  local  minimum  is 
actually  a global  minimum.  However  It  should  be  noted  that  the 
function  being  approximated  is  convex  and  consequently  has  the  pro- 
perty that  any  local  minimum  Is  a global  one  as  well. 
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PARTIALLY  OBSERVABLE  MARKOV  DECISION 
PROCESSES  OVER  AN  INFINITE  PIJ^NNING 
HORIZON  WITH  DISCOUNTING 


1 . Introduction 

This  paper  describes  a system  that  may  be  in  anyone  of  states 
1,  2,...,N.  The  true  state  of  the  system  is  not  known  with  certainty 
and  consequently  is  described  by  a probability  vector.  At  each  stage 
an  action  must  be  chosen  from  a finite  set.  This  action  returns  an 
expected  reward,  transforms  the  system  to  a new  (but  not  necessarily 
different)  state  according  to  a Markov  process,  and  yields  an  observ- 
able outcome.  The  problem  addressed  here  is  that  of  determining  an 
action  for  each  possible  state  vector  in  order  to  maximize  the  total 
expected  reward  over  an  infinite  horizon  under  a discount  factor,  6, 
where  0<3<1. 

Smallwood  and  Sondik  (4)  have  treated  thi.s  problem  for  the 
finite  horizon  case  without  a discount  factor  and  have  determined  that 
the  total  maximum  expected  reward  is  a piecewise  linear  function  of 
the  probability  state  vector.  Their  results  can  be  trivially  extended 
to  include  the  discount  case. 

The  observable  state  case,  that  is  the  case  where  the  true 
state  of  the  system  is  known  with  certainty  has  been  treated  extensively. 

For  both  the  finite  and  infinite  horizon  under  a discount  factor,  Howard  (1) 
developed  a policy  improvement  routine  for  determining  an  optimal  action 
and  the  optimal  cost  for  each  state. 
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n.  Formulation 


Tn  this  formulation,  the  notation  of  Smallwood  and  Sondik  will 
be  used.  It  is  assumed  that  this  system  can  be  modeled  by  an  N-state 
discrete  time  Markov  decision  process. 

The  observed  state  of  the  system  is  characterized  by  a proba- 
bility vector  Ti  where  is  the  probability  the  true  state  of  the 
system  is  i. 

At  each  point  in  time  an  action  must  be  selected  from  a finite 

set.  Associated  with  an  action,  a,  is  a probability  transition  matrix 

P where  is  the  conditional  probability  the  system  will  make  its 

next  transition  to  state  J given  the  current  state  is  1 and  action  a 

is  taken.  An  observed  outcome  follows  each  action  with  r^  denoting 

je 

the  probability  of  observing  output  6 given  the  new  state  of  the  system. 

is  j and  action  a was  taken.  In  addition  an  immediate  reward  w^  is 

ije 

incurred  if  action  a is  taken,  output  6 is  observed,  and  the  system  makes 
the  transition  from  state  i to  state  j.  Thus  if  action  a is  taken  and 
output  0 is  observed,  the  new  state  is  tt'  where 


f in  ^33/ 

‘[Wi/jeJ/l 


E TT  P^  r^ 
ij  1 ij  je 


The  above  transformation  is  summarized  by 


(1) 


TrT^ir/a,0) 

A policy  is  a rule  that  assigns  an  action  to  each  possible  state 
vector.  It  is  required  to  find  a policy  that  maximizes  the  expected  dis- 
counted rewards  over  all  periods  for  each  possible  state  vector.  Let 
V(rr)  be  the  total  discounted  reward  associated  with  such  a policy. 


Then  V(ir)  must  satisfy  the  following  recursive  equation. 


max 

N N 

V(ti)  = a 

(3) 


» . a 

Letting 


I 

.1,6 


PIJ’^JO'IJO 


(4) 


equation  (3)  is  simplified  somewhat  to  equation  (5) 


max 

V(ir)  = a 


^ Vi  ® L,j,e"iPij'Jo 


(5) 


Once  the  function  for  V(ir)  is  known,  an  optimal  action  for  ti  can 
can  be  determined  as  one  which  maximizes  the  right  hand  side  of  (5). 
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III.  A Learning  Examp 1 e 


As  an  illustration,  It  will  be  shown  how  the  system  described 
in  the  previous  section  may  be  applied  to  the  human  learning  process. 

Consider  a course  which  is  given  in  several  levels  of  instruc- 
tion. The  levels  are  denoted  1,  2,...,N  with  N being  the  easiest  and 
1 the  hardest.  The  structure  of  the  levels  is  a definite  hierarchy  in 
the  sense  that  if  a student  knows  the  material  at  level  1 he  must  also 
know  the  material  at  any  level  j^i.  Several  examples  where  this  situ- 
ation may  apply  follow: 

The  first  situation  is  one  where  the  material  covered  at  one 
level  includes  ai  1 that  covered  at  preceding  levels,  plus  some  additional 
material.  An  example  of  this  is  a program  developed  at  Behavioral  Tech- 
nology Laboratories  (BTL)  to  teach  students  Kirchoff's  Laws.  This 
course  is  comprised  of  eleven  levels  with  the  lowest  level  defining  the 
units  for  voltage,  current  and  resistance  up  to  the  highest  level  which 
deals  with  the  application  of  Ohm's  Law  and  Kirchoff's  voltage  and  current 
laws  in  complex  networks.  Another  program  developed  at  BTL  is  a short 
course  in  trigonometry  consisting  of  five  levels.  At  the  lowest  level 
students  are  given  the  definitions  of  the  six  basic  trigonometric  ratios. 
Then  the  student  is  given  a right  triangle  in  which  the  lengths  of  the 
sides  are  determined  by  a random  number  generator  and  the  student  is 
asked  to  determine  these  ratios  for  one  of  the  acute  angles.  Succeeding 
levels  deal  with  material  on  relationships  between  these  ratios  and  pro- 
blems testing  the  student's  knowledge  of  these  relationships. 

A second  situation  is  one  where  the  material  and  problems  covered 
at  a particular  level  are  virtually  the  same  as  the  immediately  preceding 
level  except  more  clues  and  hints  are  given  at  the  preceding  level.  A 
good  example  of  this  is  a version  of  the  Kirchoff's  laws  program  considered 
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earlier  at  Bfl.  in  whlc!i  problems  would  be  given  in  level  as  follows: 

1.  Problems  are  given  in  steps  with  cues  and  knowledge  of 
results  at  each  step. 

2.  Problems  are  given  in  steps  with  no  cues  or  knowledge  of 
results  at  each  step, 

3.  The  student  solves  problems  in  steps  but  he  chooses  the 
steps . 

4.  The  student  is  simply  given  problems  and  asked  to  solve 
them. 

A third  situation  is  one  in  which  student  is  to  be  drilled 
in  a skill  in  order  that  he  be  able  to  perform  it  rapidly.  Thus  the 
exercises  are  virtually  the  same  at  all  levels  but  the  time  constraints 
are  tighter  at  the  higher  levels.  In  the  BTL  intercept  trainer  for 
the  radar  intercept  observer  function,  the  student  is  trying  to  fire 
a missile  at  the  nose  of  a target  and  then  turn  around  and  fire  another 
missile  at  the  tail  of  that  aircraft.  The  first  missile  is  a radar 
guided  missile  fired  when  in  the  forward  quarter  and  the  second  a heat 
seeker  fired  when  in  the  rear  quarter  of  the  enemy  aircraft.  He  is 
given  a radar  reading  and  must  correct  his  angle  of  approach  so  as  to 
be  on  a lead  collision  course  that  will  insure  a high  hit  probability 
when  he  fires  the  missile.  At  higher  levels  the  student  is  given  such 
problems  at  faster  aircraft  speeds. 

Note,  however,  the  assumption  given  for  this  model  would  not 
be  applicable  for  the  situation  where  a given  level  did  not  use  certain 
material  Introduced  at  preceding  levels. 

A student  is  in  state  i if  he  knows  the  material  of  level  i 
but  not  at  any  level  more  difficult  than  i and  in  state  N+1  if  he  does 
not  know  the  material  at  any  level. 

There  are  N actions  and  action  1 consists  of  instructing  the 
student  in  the  material  of  level  i and  then  giving  the  student  a test 
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on  that  material.  For  each  action  there  are  two  possible  outcomes — 
either  the  student  passes  the  test  or  he  falls  It.  The  obj--  ive  Is 
to  develop  an  adaptive  Instructional  seqtience  so  that  the  student  demon- 
strates knowledge  of  the  material  at  level  1 as  oulckly  as  possible. 
Knowledge  at  level  1 Is  demonstrated  by  passing  a test  on  the  material 
at  level  1.  The  reward,  would  be  tbe  negative  of  the  expected 

time  it  would  take  to  obtain  Instruction  at  level  a and  the  system  goes 
from  state  1 to  state  j and  0 (success  or  failure  at  a)  Is  observed. 

For  completeness  a trap  state  (p  would  be  needed.  The  student  goes  to 
state  cp  with  probability  one  once  he  successfully  completes  the  material 
at  level  1.  The  only  action  in  state  tp  is  to  do  nothing  which  yields 
a zero  reward  and  keeps  the  student  in  state  <p  with  probability  one. 

Wollmer  (6)  treats  the  more  restricted  problem  where  p^^  = 0 
unless  i=j  or  if  i=a  and  j=i+l.  Thus  if  a student  is  in  state  1,  he 
remains  in  state  1 unless  he  receives  Instruction  at  level  1+1,  in 
which  case  he  either  remains  in  state  1 or  advances  to  sti;te  1+1.  This 
would  not  allow  the  possibility  of  forgetting. 

Other  situations  where  partially  observable  Markov  Decision 
processes  occur  are  in  machine  replacement,  decoding  from  sources  trans- 
mitting over  a noisy  channel,  medical  diagnosis,  and  searching  for  a 
moving  object. 

Note,  that  if  the  assumption  of  a strict  hierarchy  in  levels 
were  dropped,  the  set  of  states  would  expand  from  N+2  to  2^+1  including 
the  trap  state. 


■\  7 

1 


-6- 


IV.  The  Maxlmutr  Reward  Function 

In  this  section  It  will  be  shown  that  a maximum  reward  function 
exists  and  that  it  Is  a convex  function  of  the  reward  Tr. 

Let  be  the  maximum  reward  function  for  the  n period 

horizon.  Then 


max 

V (tt)  = a 
n 


(6) 


Smallwood  and  Sondlk  (4)  have  shown  that  V (tt)  is  * 

n 

1 . Convex 

2.  Piecewise  Linear 

11m 

It  will  be  shown  that  n “V^(7r)  exists  and  is  convex  In  tt. 

Define  f so  that  |v  (it)  - V (tt)  | < f all  n and  f Is  the 
n n n-i  n n 

smallest  real  number  with  this  property  and  V ^tt)  = 0.  The  f 's  are 

o n 

well  defined  since  all  V (tt)  are  bounded  above  and  below. 

n 

Lemma  1:  f . , £ gf 

n+1  “ n 

Proof  : Choose  a (it)  as  the  action  that  maximizes  the  right 

hand  side  of  (6)  for  V .^(tt)  if  V , . (tt)  > V (tt)  or  for  V (tt) 

nTi  n+t  n n 

otherwise. 

Then  Iv^^^(tt)  - V^(tt)  j < |s  ^ (V^[T(Tr/a,0)  ] 

-V  i[T(TT/a,0)]|  < 8f  . 
n-i  n 

Corollary  1:  For  n*  > n,jV^*(7T)  - V^(tt)|  < e(n) 

where  e(n)  0. 


0<6^. 


While  Smallwood  and  Sondlk  assume  6=1,  their  results  hold  for 
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Proof:  From  lemma  1,  f_  < ti"  and  consequently 


n - 


i(  on 

/^(T.)  - J f , e"f  I - £ e"/(i-ii) 

i=n+l  1=0  ^ 


Theorm  1:  The  function  is  absolutely  convergent. 

Proof:  Choose  any  particular  By  Corollary  1,  the 

V^(tt)  Is  bounded  above  and  below  and  hence  has  an  Infinite  covergent 
subsubsequence  with  limit  V*(Tr),  Choose  e > 0 and  n such  that  e(N)  < e 
for  N i n and  e(n)  is  as  defined  in  corollary  1.  For  any  N ^ n and 
n > n in  the  convergent  subsequence  |Vj^(7t)  - V-(7r)  | < e and  consequently 
|v^fTT)  - V*(Tr)|  < e.  Since  n is  Independent  of  tt,  the  theorem  is  proven 

Thus  V(tt)  = defined. 

Theorem  2:  V(tt)  Is  convex  in  tt. 

Proof:  Define  fCV.Tr^.Tr^)  = V(5s7r^  + 

Assume  V(tt)  is  not  convex  and  choose  and  such  that  f(V,7r^,Tr,  ) = 
k > 0.  Choose  n such  that  N > n -^Iv^(tt)  - V(7t)|<  K/2.  | f (V,  , tt^)  - 

< k.  Thus  ^ 0 which  is  impossible  since  V^(7t) 

is  convex. 

Note,  that  the  piecewise  linear  property  of  V^(7r)  does  not  imply 
piecewise  linearity  of  V(tt)  as  any  continuous  function  may  be  expressed 
as  the  limit  of  a sequence  of  piecewise  linear  functions. 
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V . Linear  Program  Formulation 


Tn  the  case  of  the  observable  finite  state  Markov  decision 

processes  with  a discount  factor,  the  problem  of  finding  a maximum 

return  for  each  state  may  be  formulated  as  a linear  program.  The 

development  of  this  may  be  found  in  Ross  (6).  In  this  section  it  is 

shown  that  a modification  of  this  formulation  extends  to  the  problem 

formulated  in  Section  II.  Portions  of  the  development  which  are  similar 

to  the  finite  state  case  will  be  outlined  but  without  rigorous  proofs. 

Consider  the  set  B of  all  continuous  bounded  functions  defined 
f 

on  S 

this  set  as  follows. 


= ^ 0 all  i,  'I  7T^  = 1 I . Let  the  operator  A be  defined  on 


max 

Au(7t)  = a 


^i'^i'^i  ^ ? ’^i^ij’^je^  [T(TT)/a,0)] 

i»  j »0 


(7) 


Note  that 

1.  u^v->’Au<A 

~ V 

2.  AueB  all  ueB 

3.  A:fr>B  is  a contraction  mapping  on  B. 

The  Operator  A is  the  optimal  return  function  for  the  one  period 

problem  in  which  a terminal  reward  u(tt)  Is  given  for  the  terminal  state. 

Since  A:B  B is  a contraction  mapping,  it  has  a unique  fixed  point, 

V = Av  = j,^^"'ooA'^u  for  any  uc3.  By  Equation  (3),  this  unique  fixed  point 

must  be  the  optimal  reward  function.  Let  us  consider  any  u such  that 

2 1 Id 

Au  £ u.  Then  u>.Au>.Au>.n-)-  <®a'^u  = v.  Thus  the  optimal  return  func- 
tion V minimizes  u(7t)  for  each  ireS  among  all  functions  u satisfying  Au  ^ u 
In  the  finite  state  case  where  the  above  conditions  also  hold, 
it  is  noted  that  minimizing  u^  for  each  state  i may  be  accomplished  by 
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I 


minimizing  the  sum  of  the  u^'s.  For  this  problem  where  such  a sum 
would  be  Infinite,  the  average  value  of  u(ti)  may  be  minimized.  Thus, 
finding  the  function  u(7t)  is  equivalent  to  solving  the  following 
infinite  constrained  program. 

Find  min  fi,  u such  that 


Z=/...  /u(7r)d7rd7r  -...dir, 

J J n n-1  n-2  1 


(8) 


subject  to 


+ 8 I ^.P^.r^ou[T(7r/a,0)]  £ u(it)  for 

i,J,0  ^ ^ 


(9) 


^1  ^ 0.  = 1 


Since  the  function  u(7r)  is  continuous  and  defined  on  a closed 
bounded  set,  it  may  be  expressed  in  an  N-dlmensional  Maclaurln  series: 


V(7t)  = C + y C,  , , ^1  ^2 

o , , ^ , i,  i_,...i  TT 

i , i , . . . ,i  1’  2 n 1 2 N 

i z n 


(10) 


If  V(7t)  is  expressed  as  such  a series  or  approximated  by  a 

partial  series  consisting  of  terms  up  to  degree  n,  the  coefficient  of 

C ^ simply 

^l’^2’’ • ’^n 


1 l-TT  1-TT  -ir 

y*i  ri^  ^ 

/"iS'' ‘‘Wr--'"’! 


(11) 


In  evaluating  the  integral  the  following  lemma  is  needed. 


f 


m n , 


Lemma  2:  # (a-x)  x dx  = 


m!  n!  nri-n+1 


(nrhi+1 ) ! 


1 
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Proof:  Integrating  by  parts  one  obtains  for  the  above 


Integral  - ^ J" 


a . m+1  n-1. 

(a-x)  X dx 


n / .nrt-1  n-1. 

iinhl  J X “X*  Applying  this  relationship 


recursively,  one  obtains 


nirol  /*  ,.^In+n,  m!n!  m+n+1 

(a-x)  dx  - , a 


From  this  lemma,  expression  (11)  can  be  evaluated. 

n r n 

Theorem  3:  The  value  of  expression  (11)  is  II  1 !/  F (1  +1)  I 

j'l  ^ [j-i  J J 

Proof;  Integrating  (11)  with  respect  to  tt  gives 

n 

1 l-TT  n-2 

^n'  r 4 /•  d /•  1-  I TT  n-1  1+11 

J f ^2^  f ^ ^ ^ " Vi^  ‘^Vi---S 

n A ^ A J- 


Applying  lemma  1 with  a=l-  I tt  and  Integrating  with  respect  to  tt 


yields 

n-3 

1 ' 1 ' 1-TTl  1-  y TT  , , 

n-  Vr  f r f I j 

(1  +i  +2)!  J ^2^  "J  “ Z 

" 0 0 0 1 


I'^/I 

(1  - z tt  . ) n n-1  dir  . . .dir. 


Continual  application  of  lemma  2 yields  n I I (i  +1)|  ! 


Thus  i f V(tt)  is  to  be  approximate  i by  an  n=^  degress  polynomial 
function  in  tt,  then  substituting  the  expression  of  theorem  3 and  (1)  in 
(8)  and  (9)  and  rearranging  terms  yields: 


^Ind  C » C min  Z such  that 

O X , 

1 ^ n 


z=c  +y  (n  i! 

° U T 
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X 


(i-B)Cq  + I 


^i-i 
±*i  ^ 


(13) 


where  k 


(9). 

12  n 


n 

n 


J=i 


/ V ^ ^ \ 1 4 


(14) 


n a a 

*1  1 i ^ 

2 "^N  li  ^ J® 


(15) 


for  all  6,  all  tt>0  such  that  =1 

Thus  the  problem  of  solving  the  program  (8-9)  with  a multi- 


nomial approximation  of  u(tt)  becomes  a linear  program  (12-15)  with 
an  Infinite  number  of  constralntf,  and  unrestricted  variables.  Note 
that  the  minimum  value  of  Z obtained  In  the  linear  program  (12-15) 
would  actually  be  larger  than  that  obtained  In  the  program  (8-9). 
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V I . Computational  Procedu re 

Given  an  optimal  solution  to  the  linear  program  (12-15),  con- 
sider the  set  of  constraints  for  whlcj;  the  C are  basic.  If 

^1^2" 'S 

the  program  was  solved  with  these  constraints  only,  the  same  solution 
would  be  obtained  and  all  other  constraints  would  be  satisfied.  Thus, 
while  the  program  consists  of  an  infinite  number  of  constraints,  only 
^ finite  number  need  to  be  Included  provided  the  correct  ones  are  chosen. 
This  will  be  taken  advantage  of  by  solving  the  program  with  a finite 


subset  of  the  constraints,  introducing  an  unsatisfied  constraint,  then 
dropping  any  that  are  not  binding,  and  continuing  until  an  optimal 
solution  is  obtained. 

Let  the  quantity  f(TT,C)  be  defined  as  follows. 


F(tt,C)  = (1-6)Cq  + e(  n ttJ  - QZ\ 


i'  - 1 1 

J OL  ^1  2”‘^N  ^1^2”‘SlJ 


(6) 


Cf  1 1 " I 

^1^2’ "S  1=1  ^ ^ 


(16) 


The  constraints  (13)  are  equivalent  to  F(tt,C)  ^ 0 all  tt.  Thus  if  at 
least  one  constraint  is  not  satisfied  for  a given  C vector,  the  value 
of  TT  that  minimizes  F(tt,C)  is  the  most  unsatisfied  one. 

The  procedure  for  solving  the  linear  program  (12-15)  is  given 
in  algorithm  1. 


Algorithm  1 

1.  Formulate  the  linear  program  with  any  finite  subset  of  the 
constraints  in  (13). 

2.  Solve  the  linear  program  for  C. 

3.  Delete  any  constraints  for  which  a slack  variable  is  basic. 
A.  Solve  the  following  non-linear  program. 
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Find  TT  0,  min  2 such  that 

Z'  = f(7r,C)  (17^ 

N 

I = 1 (18) 

i=l 

If  2 2l  terminate  as  C Is  optimal.  Otherwise  Introduce  the 
constraint  corresponding  to  the  value  of  tt  that  optimizes  (17-18)  and 
go  back  to  Step  2. 

A local  optimum  to  (17-18)  may  be  found  by  algorithm  2. 

Algorithm  2 

1.  Choose  an  arbitrary  probability  vector  and  evalute  f(TT,C). 

2.  Find  an  order  pair  (l,j)  such  that  Increasing  by  c and 
decreasing  by  e decreases  f(7r,C)  without  violating  0j<Tr^£l  and 

If  no  such  pair  can  be  found,  terminate  as  tt  is  a local 

optimum. 

3.  Increase  tf^  to  tf^  and  decrease  tf^  to  tf^  such  that  neither 
the  pair  (i,j)  or  (j,i)  satisfied  the  conditions  of  Step  2.  Then  go 
back  to  Step  2. 

For  finiteness,  the  c of  Step  2 would  be  chosen  ahead  of  time. 

There  are  several  ways  of  performing  Step  3 to  find  the  new 
value  of  TFj^  and  tf^  . One  efficient  way  is  to  first  bracket  and  tt 
between  tf^,  tfJ  and  tf'^  and  tf^'  aFid  continually  reduce  the  difference  between 
these  by  a factor  of  one  half,  thus  converging  on  a single  point. 

Initially  tf^  and  tf'  would  be  the  current  values  of  tt  and  tf 

and  TF^'  = TF^  + 6,  tt^'  = tt^  - 6 where  5 = min  [1-tt^,tt^].  Then  consider 

the  pair  = }j(tf|  + tt^)  and  = ^(TTj  + TFp . If  f(^,C)  is  a local 
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minimum  under  the  restriction  that  all  components  of  tt  other  than  it 


and  TTj  are  held  constant,  then  ti  is  the  desired  point.  Otherwise, 

- — II 

let  TT^  and  replace  tt^  and  if  the  direction  of  decrease  is  towards 

" _ _ tt  II 

and  7Tj  but  let  and  replace  and  if  the  direction  of 

I t 

decrease  is  towards  and  . If  neither  direction  yields  a decrease, 

- _ I I I II  II  II 

let  TT^  and  tt  replace  tt^  and  tt  if  f(TT  )>f(TT  ) but  replace  tt  and  tt 

J J 1 J 

. If  I 

otherwise.  Step  3 would  terminate  when  it.  - tt.<c.  where  e.<c, 

111  1 

Note  that  if  the  C vector  approximation  of  U(tt)  were  exact, 
any  local  minimum  of  f(TT,C)  would  be  a global  minimum  due  to  the  con- 
vexity of  V(tt).  While  this  is  not  guaranteed  in  the  approximation,  one 
could  take  random  samples  of  it  in  an  attempt  to  find  a vector  yielding 
a lower  value  of  Z'  than  the  local  minimum  or  evaluate  Z'  for  all  tt 
vectors  whose  components  are  multiples  of  1/n  where  n is  large  if  the 
result  min  Z'=0  is  obtained. 

When  Introducing  an  unsatisfied  constraint,  it  is  recommended 
that  the  dual  simplex  method  be  used  to  solve  the  resulting  program 
which  is  already  dual  feasible. 

The  sequence  of  min  Z values  generated  by  algorithm  1 is  non- 
decreasing,  bounded  above,  and  hence  must  have  a liniit.  It  is  an  open 
question  as  to  whether  this  limit  is  the  true  rain  Z or  in  particular 
if  the  sequence  of  Z'  values  in  algorithm  2 tend  to  zero.  Consider  the 
sequence  of  linear  programs  solved  by  algorithm  1 and  assume  the  number 
of  equations  in  each  equals  the  number  of  components  in  the  C vector 
plus  one.  It  has  already  been  shown  that  it  will  not  exceed  this  num- 
ber and  if  it  is  less,  additional  constraints  with  all  coefficients 
being  zero  may  be  added.  Consider  also  the  sequence  of  matrices  formed 
by  the  probability  vectors  that  generate  these  constraints.  Since  these 
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are  bounded  above,  these  matrices,  and  consequently  the  set  of  linear 
programs  for  algorithm  1 must  have  a convergent  subsequence.  Consider 
now  the  sequence  of  constraints  generated  by  this  sequence  in  aJgorithm  2. 
By  the  same  argument  this  sequence  must  have  a convergent  subsequence. 

In  this  latter  sequence,  either  f(7T,C)->-0  or  else  the  cost  coefficient 
In  the  pivot  column  tends  to  zero  for  if  not  the  increase  in  min  Z 
would  not  tend  to  zero  which  is  impossible  since  min  Z is  botinded  above. 

If  the  sequence  of  f(7r,C)  values  generated  by  problem  2 did 
not  appear  to  tend  to  zero  after  many  iterations  while  the  change 
in  min  Z did  appear  to  tend  to  zero,  some  possible  ways  out  are  as 
follows.  First  one  may  samj le  a large  number  of  probability  vectors 
and  find  one  which  would  gi\e  the  largest  Increase  in  Z on  a single 
pivot.  Second,  one  may  search  all  probability  vectors  that  are  multiples 
of  1/n  where  n is  a large  numbe'*  and  find  the  one  which  gives  the  largest 
Increase  in  Z for  one  pivot 

t 

It  should  be  noted  that  if  the  sequence  of  Z values  obtained 
in  algorithm  2 do  not  tend  to  zero,  then  one  has  a situation  somewhat 
analogous  to  cycling  in  the  dual  simplex  method.  Since  cycling  almost 
never  occurs  in  the  primal  simplex  method,  there  appears  to  be  some 

I 

basis  for  thinking  that  the  sequence  of  Z values  would  tend  to  zero 
the  majority  of  times. 

One  could  of  course  only  consider  constraints  generated  by 
probability  vectors  whose  components  are  multiples  of  1/n.  By  imposing 
a lexicographic  ordering,  one  could  Insure  a true  optimum  in  a finite 
number  of  steps. 
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VII,  Bounds  on  Accuracy 


In  solving  the  non-linear  program  (17-18)  in  Step  4 of  the 
algorithm  to  find  the  most  unsatisfied  constraint  of  the  linear  program 
(12-15),  one  may  wish  to  terminate  the  program  when  Z >-6  rather  than 
for  Z>.0  where  6 is  a small  positive  number.  If  so,  the  value  of  Z 
obtained  for  (12)  will  be  less  than  the  true  minimum  for  Z since  the 
program  has  been  optimized  for  only  a subset  of  the  constraints  How- 
ever, it  is  easy  to  see  from  (12)  and  (13)  that  increasing  by  6/(l-B) 
yields  a feasible  solution  and  increases  Z by  that  same  amount.  Conse- 
quently, this  feasible  set  would  come  to  within  6/ (1-6)  of  minimizing  Z. 

The  question  now  arises  as  to  how  close  (tt)  , the  Maclaurin 
series  approximation  to  V(7r),  is  to  the  true  value  of  V(tt).  To  answer 
this  consider  the  operator  Au(tt)  defined  in  equation  (7)  and  define: 

max 

I |Au  - u|  I = ft  |Au  - u|  (19) 

Since  the  operator  A is  a constractlon  mapping  with  |Au  - Av|<. 
e|u  - v|  it  can  be  shown  that  | |a"'*’^u  - a\|  |<.s'^|  |Au  - u|  | and 
I |a  u - u||<(l-6  )||a  u - u||/(l-6)  and  V(ir)  = it  follows  that 

|v(tt)  - V(7t)  |<|  |Av  - v|  |/(l-6)  (20) 

One  could  find  a local  maximum  to  |Av  - v|  by  an  incremental 
procedure  similar  to  that  used  to  find  the  most  unsatisfied  constraint 
to  introduce  into  the  linear  programming  problem.  Alternatively,  one 
could  enumerate  (20)  for  all  possible  probability  vectors  whose  com- 
ponents are  multiples  of  l/n. 
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1 Dr.  Leon  H.  Nawrocki 

U.S.  Army  Research  Institute  for  the 
Behavioral  and  Social  Sciences 
1300  Wilson  Blvd. 

Arlington,  VA  22209 


1 Dr.  Joseph  Ward 

U.S.  Army  Research  Institute  for  the 
Behavioral  and  Social  Sciences 
1300  Wilson  Blvd. 

Arlington,  VA  22209 

1 HQ  USAREUR  & 7th  Army 
ODCSOPS 

USAREUR  Director  of  GED 
APO  New  York  09403 

1 AIR  Field  Unit  - Leavenworth 
Post  Office  Box  3122 
Fort  Leavenworth,  KS  66027 

1 Mr.  James  Baker 

U.S.  Army  Research  Institute  for  the 
Behavioral  and  Social  Sciences 
1300  Wilson  Blvd. 

Arlington,  VA  22209 

1 Dr.  Milton  S.  Katz,  Chief 

Individual  Training  & Performance 
Evaluation 

U.S.  Army  Research  Institute  for  the 
Behavioral  and  Social  Sciences 
1300  Wilson  Blvd. 

Arlington,  VA  22209 


Air  Force 

1 Research  Branch 
AF/DPMYAR 

Randolph  AFB,  TX  78148 

1 Dr.  G.A.  Eckstrand  (AFHRL/AST) 
Wright  Patterson  AFB 
Ohio  45433 

1 Dr.  Ross  L.  Norgan  (AFHRL/ASE) 
Wright  Patterson  AFB 
Ohio  45433 

1 AFHRL/DOJN 
Stop  #63 

Lakeland  AFB,  TX  78236 

1 Dr.  Martin  Rockway  (AFHRL/TT) 
Lowry  AFB 
Colorado  80230 
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1 Instructional  Technology  Branch 
AF  Human  Resources  Laboratory 
Lowry  AFB,  CO  80230 

1 Dr.  Alfred  R.  Fregly 
AFOSR/NL 

lAOO  Wilson  Blvd. 

Arlington,  VA  22209 

1 Dr.  Sylvia  R.  Mayer  (MCIT) 

Headquarters  Electronics  Systems 
Divis ion 

LG  Ha ns com  Field 
Bedford,  MA  01730 

1 Capt.  Jack  Thorpe,  USAF 
Flying  Training  Division 
AFHRL/FT 

William  AFB,  AZ  85224 

1 AFHRL/PED 
Stop  #63 

Lackland  AFB,  TX  '’8236 


Marine  Corps 
1 Director 

Office  of  Manpower  Utilization 
Headquarters,  Marine  Corps 
Code  MPU) 

MCB  (Building  2009) 

Wuantico,  VA  22134 

1 Dr.  A.  L.  Slofkosky 

Scientific  Advisor  (Code  RD-1) 
Headquarters,  U.S.  Marine  Corps. 
Washington,  D.C.  20380 

1 Chief,  Academic  Department 
Education  Center 
Marine  Corps  Development  and 
Education  Command 
Marine  Corps  Base 
Quantico,  VA  22134 

1 Mr.  E.  A.  Dover 

2711  South  Veitch  Street 
Arlington,  VA  22206 


Coast  Guard 

1 Mr.  Joseph  J.  Cowan,  Chief 
Psychological  Research  Branch 
(G-P-1/62) 

U.S.  Coast  Guard  Headquarters 
Washington,  D.C.  20590 


Other  POD 

1 Military  Assistant  for  Human  Resources 
Office  of  the  Secretary  of  Defense 
Room  3D129,  Pentagon 
Washington,  D.C.  20301 

1 Advanced  Research  Projects 
Administrative  Services 
1400  Wilson  Blvd. 

Arlington,  VA  22209 
ATTN:  Ardella  Holloway 

1 Dr.  harold  F.  O'Neil,  Jr. 

Advanced  Research  Projects  Agency 
Human  Resources  Research  Office 
1400  Wilson  Blvd. 

Arlington,  VA  22209 

1 Dr.  Robert  Young 

Advanced  Research  Projects  Agency 
Human  Resources  Research  Office 
1400  Arlington  Blvd. 

Arlington,  VA  22209 

12  Defense  Documentation  Center 
Cameron  Station,  Building  5 
Alexandria,  VA  22314 
ATTN : TC 


Other  Government  . 

1 

1 Dr.  William  Gorham,  Director 

Personnel  Researcn  and  Development  Center  . 

U.S.  Civil  Service  Commission  ' 

1900  E.  Street,  N.W.  i 

Washington,  D.C.  20415  c 
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1 Dr.  Vern  Urry 

Personnel  Research  and  Development 
Center 

U.S.  Civil  Service  Commission 
1900  E Street,  N.W. 

Washington,  D.C.  20415 

1 Dr.  Erik  McWilliams,  Director 
Technological  Innovations  in 
Education  Group 
National  Science  Foundation 
1800  G Street,  N.W.,  Room  W 6'0 
Washington,  D.C.  20550 

1 Dr.  Richard  C.  Atkinson 
Deputy  Director 
National  Science  Foundation 
1800  G Street,  N.W. 

Washington,  D.C.  20550 

1 Dr.  Andrew  R.  Molnar 

Technological  Innovations  in 
Education  Group 
National  Science  Foundation 
1800  G Street,  N.W. 

Washington,  D.C.  20550 

1 Dr.  Marshall  S.  Smith 
Assistant  Acting  Director 
Program  on  Essential  Skills 
National  Institute  of  Education 
Brown  Building,  Rcom  815 
19th  and  M Streets,  N.W. 

Washington,  D.C.  20208 

1 Dr.  Carl  Frederiksen 

Learning  Division,  Basic  Skills  Group 
National  Institute  of  Education 
1200  19th  Street,  N.W. 

Washington,  D.C.  20208 


Miscellaneous 

1 Dr.  Scarvia  B.  Anderson 
Educational  Testing  Service 
17  Executive  Park  Drive,  N.E. 
Atlanta,  GA  30329 


1 Dr.  John  Annett 

Department  of  Psychology 
Tlie  University  of  Warwick 
Coventry  CV47AL 
ENGLAND 

1 Mr.  Samuel  Ball 

Educational  Testing  Service 
Princeton,  N.J.  08540 

1 Dr.  Gerald  V.  Barrett 
University  of  Akron 
Department  of  Psychology 
Akron,  OH  44325 

1 Dr.  Bernard  M.  Bass 
University  of  Rochester 
Graduate  School  of  Management 
Rochester,  NY  14627 

1 Dr.  Ronald  L.  Carver 
School  of  Education 
University  of  Missouri-Kansas  City 
5100  Rockhill  Road 
Kansas  City,  MO  64110 

1 Century  Research  Corporation 
4113  Lee  Highway 
Arlington,  VA  22207 

1 Dr . A . Charnes 
BEB  512 

University  of  Texas 
Austin,  TX  78712 

1 Dr.  Kenneth  E.  Clark 
University  of  Rochester 
College  of  Arts  and  Sciences 
River  Campus  Station 
Rochester,  NY  14627 

1 Dr.  Allan  M.  Collins 

Bolt  Beranek  and  Newman,  Inc. 

50  Moulton  Street 
Cambridge,  MA  02138 

1 Dr.  Rene'  V.  Dawis 

University  of  Minnesota 
Department  of  Psychology 
Minneapolis,  MN  55455 


-5- 


! 


1 


1 


1 Dr.  Rucli  Day 
Yale  University 
Department  of  Psychology 
2 Hillliouse  Avenue 
New  Haven,  CT  06520 

1 ERIC 

Processing  and  Reference  Facility 
A833  Rugby  Avenue 
Betliesda,  MD  20014 

1 Dr.  Barry  M.  Feinberg 

Bureau  of  Social  Science  Res.,  Inc. 
1 990  M Street , N ,W . 

Washington,  D.C.  20036 

1 Pr.  Victor  Fields 
Kontgomery  College 
Department  of  Psychology 
Rockville,  MD  20850 

1 Dr.  Edwin  A.  Fleishman 
Visiting  Professor 
University  of  California 
Graduate  School  of  Administration 
Irvine,  CA  92664 

1 Dr.  Robert  Glaser,  Co-Director 
University  of  Pittsburgh 
3939  O'Hara  St. 

Pittsburgh,  PA  15213 

1 Dr.  Henry  J.  Hamburger 
University  of  California 
School  of  Social  Sciences 
Irvine,  CA  92664 

1 Dr.  M.  D.  Havron 

Human  Sciences  Research,  Inc. 

7710  Old  Spring  House  Road 
West  Gate  Industrial  Park 
McLean,  VA  22101 

1 HumRRO  Central  Division 
400  Plaza  Building 
Pace  Blvd.,  at  Fairfield  Drive 
Pensacola,  FL  32505 

1 HumRRO/Wes tern  Division 
27857  Berwick  Drive 
Carmel,  CA  93921 
ATTN:  Library 


1 HumRRO  Central  Division/Columbus  Office 
Suite  23,  2601  Cross  Country  Drive 
Columbus,  GA  31906 

I HumRRO/Wes tern  Division 
27857  Berwick  Drive 
Carmel,  CA  93921 
ATTN:  Dr.  Robert  Vinebere 

1 HumRRO 

Joseph  A.  Austin  Building 
1939  Goldsmith  Lane 
Louisville,  KY  40218 

1 Dr.  Lawrence  B.  Johnson 

Lawrence  Johnson  & Associates,  Inc. 

2001  S Street,  N.W.,  Suite  502 
Washington,  D.C.  20009 

1 Dr.  Arnold  F.  Kanarick 
Honeywell,  Inc. 

2600  Ridge  Parkway 
Minneapolis,  MN  55413 

1 Dr.  Roger  A.  Kaufman 
203  Dodd  Hall 
Florida  State  Univer  ity 
Tallahassee,  FL  32306 

1 Dr.  Steven  W.  Keele 
University  of  Oregon 
Department  of  Psychology 
Uegene,  OR  97403 

1 Dr.  David  Klahr 

Carnegie-Mel Ion  University 
Department  of  Psychology 
Pittsburgh,  PA  15213 

1 Dr.  Ezra  S.  Krendel 

University  of  Pennsy Ivauiz 
Wharton  School,  DH/CC 
Philadelphia,  PA  19174 

1 Dr.  Alma  E.  Lantz 
University  of  Denver 
Denver  Research  Institute 
Industrial  Economics  Division 
Denver,  CO  80210 
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1 Mr.  Brian  McNally 

Educational  Testing  Service 
Princeton,  NJ  08540 

1 Dr,  Robert  R,  Mackie 

Human  Factors  Research,  Inc. 

6780  Gorton  Drive 

Santa  Barbara  Research  Park 

Goleta,  CA  93017 

1 Dr.  William  C.  Mann 

University  of  Southern  California 
Information  Sciences  Institute 
4676  Admiralty  Way 
Marine  del  Rey,  CA  90291 

1 Dr.  Leo  Munday,  Vice  President 
American  College  Testing  Program 
P.O.  Box  168 
Iowa  City,  lA  52240 

1 Dr.  Donald  A.  Norman 

Dept,  of  Psychology  C-009 
University  of  California,  San  Diego 
La  Jolla,  CA  92093 

1 Mr.  A.  J.  Pesch,  President 
Eclectech  Associates,  Inc. 

P.O.  Box  178 

North  Stonington,  CT  06359 

1 Mr.  Luigi  Petrullo 

2431  North  Edgewood  St. 

Arlington,  VA  22207 

1 Dr,  Steven  M,  Pine 

University  of  Minnesota 
Depirtment  of  Psychology 
Minneapolis,  MN  55455 

1 Dr.  Dianne  M.  Ramsey-Klee 
R-K  Research  & Systems  Design 
3947  Ridgement  Drive 
Malibu,  CA  90265 

1 Dr.  Leonard  L,  Rosenbaum,  Chairman 
Montgomery  College 
Department  of  Psychology 
Rockville,  MD  20850 

1 Dr,  Arthur  I.  Siegel 

Applied  Psychological  Services 
404  East  Lancaster  Ave , 

Wayne,  PA  19087 


1 Dr,  Richard  Snow 
Stanford  University 
School  of  Education 
Stanford,  CA  94305 

1 Dr.  C,  Harold  Stone 
1428  Virginia  Ave. 

Glendale,  CA  91202 

1 Mr.  Dennis  J.  Sullivan 

c/o  HAISC,  Building  119,  M.S.  2 

P.O.  Box  90515 

Los  Angeles,  CA  90009 

1 Dr,  K.  W.  Uncapher 

University  of  Southern  California 
Information  Sciences  Institute 
4676  Admiralty  Way 
Marine  del  Rey,  CA  90291 

1 Dr.  Benton  J.  Underwood 
Northwestern  University 
Department  of  Psychology 
Evanston,  IL  60201 

1 Dr.  Carl  R.  Vest 

Battelle  Memorial  Institute 
Washington  Operations 
2030  M Street,  N.W. 

Washington,  D.C.  20036 

1 Dr,  David  J.  Weiss 

University  of  Minnesota 
Department  of  Psychology 
N660  Elliott  Hall 
Minneapolis,  MN  55455 

1 Dr.  K.  Wescourt 
Stanford  University 
Institute  for  Mathematical  Studies 
in  the  Social  Sciences 
Stanford,  CA  94305 

1 Dr.  Anita  West 

Denver  Research  Institute 
University  of  Denver 
Denver,  CO  80210 

1 Dr.  Kenneth  N,  Wexler 
University  of  California 
School  of  Social  Sciences 
Irvine,  CA  92664 
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1 Ur.  John  J.  Collins 
Vice  President 
Essex  Corporation 
6305  Caminito  Estrellado 
San  Diego,  CA  92120 

1 Dr.  Patrick  Suppes , Director 

Institute  for  Mathematical  Studies 
in  the  Social  Sciences 
Stanford  University 
Stanford,  CA  94305 
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